Fundamentals Formal Foundations and Semantics of Data Extraction
نویسندگان
چکیده
SYNONYMS web data extraction toolkit, web information extraction system, wrapper generator, wrapper generator toolkit, web macros, web scraper. DEFINITION A web data extraction system is a software system that automatically and repeatedly extracts data from web pages with changing content and delivers the extracted data to a database or some other application. The task of web data extraction performed by such a system is usually divided into five different functions: (1) web interaction, which comprises mainly the navigation to usually predetermined target web pages containing the desired information; (2) support for wrapper generation and execution, where a wrapper is a program that identifies the desired data on target pages, extracts the data and transforms it into a structured format; (3) scheduling, which allows repeated application of previously generated wrappers to their respective target pages; (4) data transformation, which includes filtering, transforming, refining, and integrating data extracted from one or more sources and structuring the result according to a desired output format (usually XML or relational tables); and (5) delivering the resulting structured data to external applications such as database management systems, data warehouses, business software systems, content management systems, decision support systems, RSS publishers, email servers, or SMS servers. Alternatively, the output can be used to generate new web services out of existing and continually changing web sources.
منابع مشابه
Formal Foundations of General System Modeling
We present an approach to the definition of an object-oriented modeling paradigm done in the scope of general system modeling. The paradigm includes a formally defined metamodel and its supporting philosophical and natural science foundations. The metamodel exhibits its internal consistency, supported by Russell’s theory of types, and its consistency in interpretation of subjects of modeling, s...
متن کاملTowards Formal Foundations of Event Queries and Rules
The field of complex event processing still lacks formal foundations. In particular, event queries require both declarative and operational semantics. We put forward for discussion a proposal towards formal foundations of event queries that aims at making well-known results from database queries applicable to event queries. Declarative semantics of event queries and rules are given as a model t...
متن کاملLoose Semantics for Uml/ocl
This paper deals with formal foundations for a subset of the UML notation (subset of class diagrams and constraints in OCL). There are already various proposals for semantics of UML and a few for OCL. Nevertheless, it is argued that these approaches are not fully adequate for building a conceptual bridge between the programming artifacts produced from UML/OCL and the formal semantics. A differe...
متن کاملFundamentals and Pragmatics of an Entity-Relationship Approach
ii Preface Studying modern database languages one recognizes that there is a gap between language features and theoretical foundations: Studies of the formal foundations exist for the relational data model but not for the Entity-Relationship model, which is a model used by numerous practical people. Also, most extensions of the Entity-Relationship model and other semantic data models lack a pre...
متن کاملSpatial Role Labeling Annotation Scheme
Given the large body of the past research on various aspects of spatial information, the main obstacles for employing machine learning for extraction of this type of information from natural language have been: a) the lack of an agreement on a unique semantic model for spatial information; b) the diversity of formal spatial representation models ; c) the gap between the expressiveness of natura...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008